From #pixels to objects: how the #brain builds rich representation of the natural world
Posted on April 15th, 2017
04/06/2017 @RutgersUniversity, Easton Hub Auditorum, Fiber Optics Building, Busch Campus
Jack Galliant @UCBerkeley presented a survey of current research on mapping the neurophysiology of the visual system in the brain. He first talked about the overall view of visual processing since the Felleman and Van Essen article in Cerebral Cortex in 1992. Their work on macaque monkey showed that any brain area has a 50% chance of being connected to any other part of the brain. Visual processing can be split into 3 areas
1.Early visual area – 2.intermediate visual areas – 3.high level visual areas
With pooling nonlinear transformations between areas (the inspiration for the non-linear mappings in convolutional neural nets (CNN)). The visual areas were identified by retinotopic maps – about 60 areas in humans with macaques having 10 to 15 areas in the V1 area.
Another important contribution was by David J. Field who argued that the mammalian visual system can only be understood relative to the images it is exposed to. In addition, natural images have a very specific structure – 1/f noise in the power spectrum – due to the occlusion of images which can be viewed from any angle (see Olshausen & Field, American scientist, 2000)
This lead to research resolving natural images by characterizing them by the correlation of pairs of points. Beyond pairs of points that approach becomes too computational intensive. In summary, natural images are only a small part of the universe of images (most of which humans classify as white noise)
Until 2012, researchers needed to specify the characteristics to identify items in images, but LeCun, Bengio & Hinton, Nature, 2015 showed that Alexnet could resolve many images using multiple layer models, faster computation, and lots of data. These deep neural nets work well, but the reasons for their success have yet to be worked out (He estimates it will take 5 to 10 years for the math to catch up).
One interesting exercise is running a CNN and then looking for activation in a structure in the brain: mapping the convolutional layers and feature layers to the correspondence on layers in the visual cortex. This reveals that V1 has bi-or tri-phasic functions – Gabor functions in different orientations. This is highly efficient as a sparse code needs to activate as few neurons as possible.
Next they used motion-energy models to see how mammals detect motion in the brain Voxels in V1 (Shinji Nishimoto). They determined that monitoring takes 10 to 20ms using Utah arrays to monitor single neurons. They have animal watch movies and analyze the input images using combination of complex and simple cell models (use Keras) to model neurons in V1 and V2 using a 16ms time scale.
High level visual areas
Jack then talked about research identify neurons in high level visual areas that respond to specific stimuli. Starting with fMRI his groups (Huth, Nishimoto, Vu & Gallant, Neuron, 2012) has identified many categories: face areas vs. objects; place minus face. By presented images and mapping which voxels in the brain are activated one can see how the 2000 categories are mapped in the brain using wordmap as the labels. Similar concepts are mapped to similar locations in the brain, but specific items in the semantic visual system interact with the semantic language areas – so a ‘dog’ can active many areas so it can be used in different ways and can be unified as needed. Each person will have a different mapping depending on their previous good and bad experiences with dogs.
He talked about other topics including the challenges of determining how things are stored in places: Fourier power, object categories, subjective distance. In order to activate any of these areas in isolation, one needs enough stimulus to activate the earlier layers. They have progress by building a decoder from the knowledge of the voxel which run from the brain area backwards to create stimulus. A blood flow model are used with a 2 second minimum sampling period. But there is lots of continuity so they can reconstruct a series of images.
Intermediate visual area
Intermediate visual areas between the lower and higher levels of processing are hard to understand – looks at V4. They respond to shapes of intermediate complexity, but not much else like a curvature detector. Using fMRI they know what image features correlate with specific areas, but there is no strong indication differentiating one layer from another. Using the Utah array, they need to do a log-polar transform to improve prediction in V4. Using a receptor field model, they can create a predictor frame and match brain activity to images that gave the largest response.
To improve prediction on V4, Utah arrays need to do a log-polar transform. However, the images are messy and predicting V4 is not the same as understanding V4.
Finally, he talked about attenuation and tuning effects on single neurons. In an experiment in subjects watched a movie and were asked to search for either humans or vehicles, there were changes in the semantic map based on the search criterion. These tuning shift effects are a function of distance to visual periphery: Attentional effects are small in V1 and get larger in the ensuing layers.
In the Q&A, he made the following points:
- The visual word form area in the brain becomes active as you learn to read. This change does not occur for people who are illiterate.
- One of the experimental assumptions is that the system is stationary, so there is not adaptation. If adaptation does occur, then they cannot compute a noise ceiling for the signals.
[Neural nets take inspiration from the neurobiology, especially the creation of convolutional neural nets, but there is now feedback with neurobiology using the tools created in machine learning to explore possible models of brain mapping. Does the pervasive existence of Gabor filters lead to an argument that their presence indicates that natural images are closely allied with fractal patterns?]
#Driverless #Trucks will come before driverless #cars
Posted on April 13th, 2017
04/12/2017 @MetroTech 6, NYU, Brooklyn, NY
Seth Clevenger – technology editor, Transport Topics News, @sethclevenger, talked about the rollout of driverless trucks. His main message was that there are many intermediate stages from adaptive cruise control (already exists in some cars) to fully autonomous operation.
Truck manufacturers are concentrating on systems that assist rather than replace drivers. These include
- Truck platooning – could roll out by year-end. – synchronize breaking; trucks can draft off each other for a 10% increase in efficiency. Brakes are linked, but still need drivers.( Peloton Technology plans to begin fleet trials)
- Connected vehicles – just starting to be regulated. (V2V, V2I). For instance, safety messages sent by each vehicle.
- auto docking at loading docks
- traffic jam assist – move forward slowly without driver assistance
Startups include: Uber/Otto, Embark, Starsky Robotics, Driver.ai
[One of my major concerns is the integrity of the software controlling the vehicle. A failure in software could cause accidents, however, my main concern is the potential insertion of a malicious virus as a sleeper cell within the millions of lines of code. In this case, the results could be catastrophic as all breaking and acceleration systems could be programmed to fail on a specific date in the future. At that moment, all vehicles on the road would be out of control potentially resulting in millions of accidents and thousands of deaths and injuries. Preventing such an event will require coordinating amongst suppliers and enforcement of strict software standards. The large number of suppliers makes this job especially complicated. This sleeper cell could lie dormant for years before it is activated.]
Applications of #DeepLearning in #Healthcare
Posted on March 28th, 2017
03/28/2017 @NYU Courant Institute (251 Mercer St, New York, NY)
Sumit Chopra, the head of A.I. Research @Imagen Technologies, introduced the topic by saying that the two areas in our lives that will be most affect by AI are healthcare and driverless cars.
Healthcare data can be divided into
- Other – cell phones, etc.
Payer data – from insurance provider
Clinical data – incomplete since hospitals don’t share their datasets; digital form with privacy concerns
Payer data more complete unless the patient switches the payer, less detail.
He focuses on medical imaging – mainly diagnostic radiology – 600mm studies in the U.S., but shortage of skilled radiologists. Prevalence of errors. The images are very large size, high resolution, low contrast, highly subtle cues => radiology is hard to do well
Possible solution: pre-train a standard model: Alexnet/VGG/… on a small number of images, but this might not work since the signal is subtle.
Also radiology reports, which could be used for supervised training, are unstructured and it’s hard to tell what the report tells you. => weak labels at best
Much work has been done on this problem, usually using deep convolutional neural nets.
First step: image registration = rotate & crop.
Train a deep convolutional network (registration network) , the send to a detection network for binary segmentation.
Could use generative models for images to train doctors
Leverage different modalities of data
Sumit has round that a random search of hyperparameter space works better than either grid search or optimizer search.
Building an #AI #AutonomousAgent using #SupervisedLearning with @DennisMortensen
Posted on March 23rd, 2017
03/21/2017 @ Rise, 43 West 23rd Street, NY, 2nd floor
In mid 2013, firstname.lastname@example.org started x.ai to employ machine learning to set up meetings. After an introduction to the software, Dennis talked about the challenges for creating a conversational agent to act as your assistant setting up business meetings.
He talked about the 3 processes within the agent: NLU + reasoning + NLG
Natural Language Understanding needs to define the universe – what is it we can do and what is it that we cannot do and will not do?
Natural Language Understanding (NLU) Challenges
- Define intents then hire AI trainers. Need to get the intents right since it’s expensive to change to a different scheme
- What data set do we align to? What are the guidelines for labeling? Coders need to learn and remember the rules defining all the intents. Need to keep it compact, but not too much so
- They have 101 AI trainers full time. On what software do they label the words? Need a custom-built annotation platform. Spent 2 years building it.
- How do people want the agent to behave? Manually determine what is supposed to happen. This will create a new intent, but this often requires changes in the coding of the NLU
- Some of the things humans want to do are very complicated. Especially common sense
- don’t do meeting after 6:00, but if there is one at 6:15, there is a reason for this happening.
- a 6:30 PM call to Singapore might be a good idea.
- When to have a meeting and when to have a phone call
Natural Language Generation (NLG) challenges
- They have 2 interaction designers
- Need to inject empathy if it’s appropriate. For instance if there is a change in schedule, we need to respond appropriately: understanding initially and more assertive if the change needs to be unchanged. Also need to honor requests to speak in a given language.
They evaluate the performance of the software when being used by a client by
- customer-centric metrics, such as the number of schedule changes
- is the customer happy?
NYAI#7: #DataScience to Operationalize #ML (Matthew Russell) & Computational #Creativity (Dr. Cole)
Posted on November 22nd, 2016
11/22/2016 Risk, 43 West 23rd Street, NY 2nd floor
Speaker 1: Using Data Science to Operationalize Machine Learning – (Matthew Russell, CTO at Digital Reasoning)
Speaker 2: Top-down vs. Bottom-up Computational Creativity – (Dr. Cole D. Ingraham DMA, Lead Developer at Amper Music, Inc.)
Matthew Russell @DigitalReasoning spoke about understanding language using NLP, relationships among entities, and temporal relationship. For human language understanding he views technologies such as knowledge graphs and document analysis is becoming commoditized. The only way to get an advantage is to improve the efficiency of using ML: KPI for data analysis is the number of experiments (tests an hypothesis) that can be run per unit time. The key is to use tools such as:
- Vagrant – allow an environmental setup.
- Jupyter Notebook – like a lab notebook
- Git – version control
- Automation –
He wants highly repeatable experiments. The goal is to speed up the number of experiments that can be conducted per unit time.
He then talked about using machines to read medical report and determine the issues. Negatives can be extracted, but issues are harder to find. Uses an ontology to classify entities.
He talked about experiments on models using ontologies. The use of a fixed ontology depends on the content: the ontology of terms for anti-terrorism evolves over time and needs to be experimentally adjusted over time. Medical ontology is probably most static.
In the second presentation, Cole D. Ingraham @Ampermusic talked about top-down vs bottom-up creativity in the composition of music. Music differs from other audio forms since it has a great deal of very large structure as well as the smaller structure. ML does well at generating good audio on a small time frame, but Cole thinks it is better to apply theories from music to create the larger whole. This is a combination of
Top-down: novel&useful, rejects previous ideas – code driven, “hands on”, you define the structure
Bottom-up: data driven – data driven, “hands off”, you learn the structure
He then talked about music composition at the intersection of Generation vs. analysis (of already composed music) – can do one without the other or one before the other
To successfully generate new and interesting music, one needs to generate variance. Composing music using a purely probabilistic approach is problematic as there is a lack of structure. He likes the approach similar to replacing words with their synonyms which do not fundamentally change the meaning of the sentence, but still makes it different and interesting.
It’s better to work on deterministically defined variance than it is to weed out undesired results from nondeterministic code.
As an example he talked about Wavenet (google deepmind project) which input raw audio and output are raw audio. This approach works well for improving speech synthesis, but less well for music generation as there is no large scale structural awareness.
Cole then talked about Amper, as web site that lets users create music with no experience required: fast, believable, collaborative
They like a mix of top-down and bottom-up approaches:
- Want speed, but neural nets are slow
- Music has a lot of theory behind it, so it’s best to let the programmers code these rules
- Can change different levels of the hierarchical structure within music: style, mood, can also adjust specific bars
Runtime written in Haskell – functional language so its great for music
NYAI#5: Neural Nets (Jason Yosinski) & #ML For Production (Ken Sanford)
Posted on August 24th, 2016
08/24/2016 @Rise 43 West 23rd Street, NY, 2nd floorPreview Changes
Jason Yosinski@GeometricTechnology spoke about his work on #NeuralNets to generate pictures. He started by talking about machine learning with feedback to train a robot to move more quickly and using feedback to computer-generate pictures that are appealing to humans.
Jason next talked about AlexNet, based on work by Krizhevsky et al 2012, to classify images using a neural net with 5 convolutional layers (interleaved with max pooling and contrast layers) plus 3 fully connected layers at the end. The net with 60 million parameters was training on ImageNet which contains over 1mm images. His image classification Code is available on http://Yosinski.com.
Jason talked about how the classifier thinks about categories when it is not being trained to identify that category. For instance, the network may learn about faces even though there is no human category since it helps the system detect things such as hats (above a face) to give it context. It also identifies text to give it context on other shapes it is trying to identify.
He next talked about generating images by inputting random noise and randomly changing pixels. Some changes will cause the goal (such as a ‘lions’) to increase in confidence. Over many random moves, the goal increases in its confidence level. Jason showed many random images that elicited high levels of confidence, but the images often looked like purple-green slime. This is probably because the network, while learning, immediately discards the overall color of the image and is therefore insensitive to aberrations from normal colors. (See Erhan et al 2009)
[This also raises the question of how computer vision is different from human vision. If presented with a blue colored lion, the first reaction of a human might be to note how the color mismatches objects in the ‘lion’ category. One experiment would be to present the computer model with the picture of a blue lion and see how it is classified. Unlike computers, humans encode information beyond their list of items they have learned and this encoding includes extraneous information such as color or location. Maybe the difference is that humans incorporate a semantic layer that considers not only the category of the items, but other characteristics that define ‘lion-ness’. Color may be more central to human image processing as it has been conjectured that we have color vision so we can distinguish between ripe and rotten fruits. Our vision also taps into our expectation to see certain objects within the world and we are primed to see those objects in specific contexts, so we have contextual information beyond what is available to the computer when classifying images.]
To improve the generated pictures of ‘lions’, he next used a generator to create pictures and change them until they get a picture which has high confidence of being a ‘lion’. The generator is designed to create identifiable images. The generator can even produce pictures on objects that it has not been trained to paint. (Need to apply regularization to get better pictures for the target.)
Slides at http://s.yosinski.com/nyai.pdf
In the second talk, Ken Sanford @Ekenomics and H20.AI talked about the H2O open source project. H2O is a machine learning engine that can run in R, Python,Java, etc.
Ken emphasized how H2O (a multilayer feed forward neural network) provides a platform that uses the Java Score Code engine. This easies the transition from the model developed in training and the model used to score inputs in a production environment.
He also talked about the Deep Water project which aims to allow other open source tools, such as MXNET, Caffe, Tensorflow,… (CNN, RNN, … models) to run in the H2O environment.
#Unsupervised Learning (Soumith Chintala) & #Music Through #ML (Brian McFee)
Posted on July 26th, 2016
07/25/2016 @Rise, 28 West 24rd Street, NY, 2nd floor
Two speakers spoke about machine learning
In the first presentation, Brian McFee @NYU spoke about using ML to understanding the patterns of beats in music. He graphs beats identified by Mel-frequency cepstral coefficients (#MFCCs)
Random walk theory combines two representations of points in the graph.
- Local: In the graph, each point is a beat, edge connect adjacent beats. Weight edges by MFCC .
- Repetition: Link k-nearest neighbor by repetition = same sound – look for beats. Weight by similarity (k is set to the square root of the number of beats)
- Combination: A = mu * local + (1-mu)*repetition; optimize mu for a balanced random walk , so probability of a local move – probability of a repetition move over all vertices. Use a least squares optimization to find mu so the two parts of the equation make equal contributions across all points to the value of A.
The points are then partitioned by spectral clustering: normalize Laplacian – take bottom eigenvectors which encode component membership for each beat; cluster the eigenvectors Y of L to reveal the structure. Gives hierchical decomposition of the time series. m=1, the entire song. m=2 gets the two components of the song. As you add more eigenvectors, the number of segments within the song increases.
Brain then showed how this segmentation can create compelling visualizations of the structure of a song.
The Python code used for this analysis is available in the msaf library.
He has worked on convolutional neural nets, but find them to be better at handing individual notes within the song (by contrast, rhythm is over a longer time period)
In the second presentation, Soumith Chintala talked about #GenerativeAdversarialNetworks (GAN).
Generative networks consist of a #NeuralNet “generator” that produces an image. It takes as input a high dimensional matrix (100 dimensions) of random noise. In a Generative Adversarial Networks a generator creates an image which is optimized over a loss function which evaluates “does it look real”. The decision of whether the image looks real is determined by a second neural net “discriminator” that tries to pick the fake image from a set of other real images plus the output of the generator.
Both the generator and discriminator NN’s are trained by gradient descent to optimize their individual performance: Generator = max game; discriminator = min game. The process optimizes Jensen-Shannon divergence.
Soumith then talked about extensions to GAN. These include
Class-conditional GANS – take noise + class of samples as input to the generator.
Video prediction GANS –predict what happens next given the previous 2 or 3 frames. Added a MSE loss (in addition to the discriminator classification loss) which compares what happened to what is predicted
Deep Convolution GAN – try to make the learning more stable by using a CNN.
Text-conditional GAN – input =noise + text. Use LSTM model on the text input. Generate images
Disentangling representations – InfoGAN – input random noise + categorical variables.
GAN is still unstable especially for larger images, so work to improve it includes
- Feature matching – take groups of features instead of just the whole image.
- Minibatch learning
No one has successfully used GAN for text-in to text-out
The meeting was concluded by a teaser for Watchroom – crowd funded movie on AI and VR.
Automatically scalable #Python & #Neuroscience as it relates to #MachineLearning
Posted on June 28th, 2016
06/28/2016 @Rise, 43 West 23rd Street, NY, 2nd floor
Braxton McKee (@braxtonmckee ) @Ufora first spoke about the challenges of creating a version of Python (#Pyfora) that naturally scales to take advantage of the hardware to handle parallelism as the problem grows.
Braxton presented an example in which we compute the minimum distance from target points a larger universe of points base on their Cartesian coordinates. This is easily written for small problems, but the computation needs to be optimized when computing this value across many cpu’s.
However, the allocation across cpu’s depends on the number of targets relative to the size of the point universe. Instead of trying to solve this analytically, they use a #Dynamicrebalancing strategy that splits the task and adds resources to the subtasks creating bottlenecks.
This approach solves many resource allocation problems, but still faces challenges
- nested parallelism. They look for parallelism within the code and look for bottlenecks at the top level of parallelism and split the task into subtasks at that level, …
- the data do not fit in memory. They break tasks into smaller tasks. They also have each task know which other caches hold data, so they can be accessed directly without going to slower main memory
- different types of architectures (such as gpu’s) require different types of optimization
- the optimizer cannot look inside python packages, so cannot optimize a bottleneck within a package.
- is a just-in-time compiler that moves stack frames from machine-to-machine and senses how to take advantage of parallelism
- tracks what data a thread is using
- dynamically schedules threads and data
- takes advantage of mutability which allows the compiler to assume that functions do no change over time so the compiler can look inside the function when optimizing execution
- is written on top of another language which allows for the possibility of porting the method to other languages
In the second presentation, Jeremy Freeman @Janelia.org spoke about the relationship between neuroscience research and machine learning models. He first talking about the early works on understanding the function of the visual cortex.
Findings by Hubel & Wiesel in1959 have set the foundation for visual processing models for the past 40 years. They found that Individual neurons in the V1 area of the visual cortex responded to the orientation of lines in the visual field. These inputs fed neurons that detect more complex features, such as edges, moving lines, etc.
Others also considered systems which have higher level recognition and how to train a system. These include
Perceptrons by Rosenblatt, 1957
Neocognitrons by Fukushima, 1980
Hierarchical learning machines, Lecun, 1985
Back propagation by Rumelhart, 1986
His doctoral research looked at the activity of neurons in V2 area. They found they could generate high order patterns that some neurons discriminate among.
But in 2012, there was a jump in performance of neural nets – U. of Toronto
By 2014, some of the neural network algos perform better than humans and primates, especially in the area of image processing. This has lead to many advances such as Google deepdream which combines images and texture to create an artistic hybrid image.
Recent scientific research allows one to look at thousands of neurons simultaneously. He also talked about some of his current research which uses “tactile virtual reality” to examine the neural activity as a mouse explores a maze (the mouse walks on a ball that senses it’s steps as it learns the maze).
Jeremy also spoke about Model-free episodic control for complex sequential tasks requiring memory and learning. ML research has created models such as LSTM and Neural Turing Nets which retain state representations. Graham Taylor has looked at neural feedback modulation using gates.
He also notes that there are similar functionalities between the V1 area in the visual cortex, the A1 auditory area, and the S1, tactile area.
To find out more, he suggested visiting his github site: Freeman-lab and looking the web site neurofinder.codeneuro.org.
#TensorFlow and Cloud Machine Learning
Posted on June 7th, 2016
06/06/2016 @Audible Inc, 1 Washington Place, Newark, NJ 15th floor
Joshua Gordon @Google talked about #MachineLearning and the TensorFlow package. TensorFlow is an open source library of machine learning programs. Using the library you can manipulate tensors by defining graphs that are functions operating on the multivariate structures.
The library runs on Linux and OSx. The library runs on Windows using Docker. Support for Android is on the way. Joshua showed several applications including one that repaints an image in van Gogh’s style by merging levels from the network identifying colors from the original image with layers from a second network trained on the painter’s style.
Next, Yufeng Guo @Google talked about out-of-the-box machine learning APIs to classify images. Google has a cloud vision API and will shortly release a speech API.
The vision API imports a jpg file and outputs a description in JSON format including items identified and the confidence that the items are correctly identified. It also gives the coordinates of items identified and links to the full description of the items in Google’s database. The face detection routine also outputs information such as the rollAngle, JoyLikelihood, etc. The service is free for up to 1000 requests per month.
#DataDrivenNYC: #FaultTolerant #Web sites, #Finance, Predicting #B2B buying behavior, training #DeepLearning
Posted on May 18th, 2016
05/18/2016 @AXA auditorium, 787 7th Avenue, NY
Four speakers presented:
- Peter Brodsky, Founder and CEO of HyperScience (AI for the enterprise)
- Louis DiModugno, Chief Data and Analytics Officer at AXA US(global leader in insurance)
- Amanda Kahlow, Founder and CEO of 6Sense (B2B predictive intelligence)
- Nicolas Dessaigne, Founder and CEO of Algolia (hosted search API that delivers instant results)
First, Nicolas Dessaigne @Algolia (Subscription service to access a search API) talked about the challenges building a highly fault-tolerant world-wide service. The steps resulted from their understanding of points of failure within their systems and the infrastructure their systems depend on.
Initially, they concentrated on their software development process including failed updates. To overcome these problems, they update one server at a time (with a rack of servers), do partial updates, use Chef to automate deployment.
Then they migrated their DNS provider from .io to .net TLD to avoid slow response times they had seen intermittently in Asia. This was followed by the upgrades:
Feb 2015. Set up clusters of servers world-wide , so users have a server in their region: lower latency
March 2015. Physically separate server clusters within a region to different providers
May 2015. Create fallback DNS servers
July 2015. Put a third data center online to make indexing robust
April 2016. Implement a 1 second granularity for their system monitoring
Next, Matt Turck interviewed Louis DiModugno @AXA . In the US, AXA’s main focus is on predictive underwriting of insurance process. They also have projects to incorporate sensors into products and correctly route queries to call centers based on the demographics of the customer. World-wide they have three analysis hubs: France, US, Singapore (coming online).
Louis oversees both data and analytics in the U.S. and both he and the CTO report to the CIO. They are interested in expanding their capabilities in areas such as creating unstructured databases from life insurance data that are currently on microfiche.
In the third presentation, Amanda Kahlow @6Sense talked about their business model to provide information to customers in B2B commerce. They analyze business searches, customer web sites, visits to publisher’s (e.g. Forbes) web sites. Their goal is to determine the timing of customer purchases.
B2B purchases are different from B2C purchases since
- Businesses research their purchases online before they buy
- The research takes time (long sales cycle)
- The decision to buy involves multiple people within the company
So, there are few impulse buys and buyer behavior signals that a purchase is imminent.
The main CMO question is when (not who).
6sense ties data across searches (anonymous data). The goal is to identify when companies are in a specific part of the buying cycle, so sales can approach them now. (Example: show click-to-chat when the analytics says that the customer is ready to buy)
Lastly, Peter Brodsky @HyperScience spoke about tools they are developing to speed machine learning. These include
- Tools to make it easier to add new data sets
- need to match fields, such as date which may be in different formats
- what to do with missing data
- need labeled data – lots of examples
- Speed up training time
The speed up is done by identifying subnets within the larger neural network. The subnets perform distinct functions. To determine if two subnets (in different networks) are equivalent, move one subnet from one network to replace another subnet in another network and see if the function is unchanged: Freeze the weights within the subnet and outside the subnet. Retrain the interface between the net and the subnet.
This creates building blocks which can be combined into larger blocks. These blocks can be applied to jump start the training process.