American think tank analyzes the potential of small data and artificial intelligence
American think tank analyzes the potential of small data and artificial intelligence
Small data methods refer to artificial intelligence methods that do not require large datasets for training. This method helps to solve situations where there is no labeled data or labeled data is scarce, reducing dependence on collecting large datasets from the real world. Small data methods can be roughly divided into five categories: ① Transfer learning, which first learns to perform tasks in a data rich environment, and then "transfers" what is learned to tasks with data scarcity; ② Data labeling, suitable for situations where labeled data is limited but there is a large amount of unlabeled data, using methods such as automatic generation of labels or active learning to understand existing unlabeled data; ③ Artificial data generation aims to extract maximum information from a small amount of data by creating new data points or other related technologies Bayesian method, using machine learning and statistical methods, incorporates architectural information about the problem into the problem-solving approach, focusing on generating good calibration estimates for the uncertainty of its predictions; ⑤ Intensive learning: computer systems learn how to interact with the environment through trial and error, and are often used to train game systems, robots and autonomous vehicle.
1. Narrowing the gap in artificial intelligence capabilities between entities
Large datasets are becoming increasingly important for many artificial intelligence applications. Due to the varying abilities of different entities to collect, store, and process data, large technology companies with artificial intelligence capabilities may widen the gap compared to other companies. If transfer learning, automatic tagging, Bayesian methods, and other methods can be applied to artificial intelligence with limited data, the entry threshold for small entities in terms of data will be lowered, which can narrow the gap in artificial intelligence capabilities between large and small entities.
2. Reduce the collection of personal data
Some small data methods can reduce the behavior of collecting personal data, such as manual data generation or the use of simulated training algorithms, which do not rely on personal data or have the ability to synthesize data to remove sensitive personally identifiable attributes. Although this does not mean that all privacy issues can be resolved, reducing the need to collect large amounts of real-world data can alleviate people's concerns about the large-scale collection, use, or disclosure of consumer personal data.
3. Promote the development of data scarcity areas
Many recent advances in artificial intelligence have been achieved through the explosive growth of available data. However, for many important issues, the data that can be input into artificial intelligence systems may be scarce or non-existent. The small data approach can provide a rule-based way to handle the lack of data. Knowledge can be transferred from related issues using labeled and unlabeled data; It is also possible to use the existing small number of data points to create more data points, relying on prior knowledge about the problem domain, or by building simulation or coding structure assumptions to venture into new domains.
4. Avoid 'dirty data'
Small data methods can benefit institutions troubled by 'dirty data'. For example, the US Department of Defense has a large amount of "dirty data" that requires a lot of time and manpower for data cleaning, labeling, and organization. The data tagging method in small data can reduce the difficulty of processing large amounts of unlabeled data by automatically generating labels. Transfer learning, Bayesian methods, or artificial data methods can reduce the amount of data that needs to be cleaned up, significantly reducing the scale of 'dirty data'.
1. Artificial intelligence is not equal to big data, nor is it synonymous with large, pre labeled datasets. Big data has played a role in the artificial intelligence boom of the past decade, but if large-scale data collection and labeling are made a prerequisite for the development of artificial intelligence, it can easily lead policy makers astray.
2. The research on transfer learning is developing rapidly, and this method may be more widely applied in the future.
The competition between the United States and China in the field of small data methods is very fierce. The United States has significant advantages in reinforcement learning and Bayesian methods, but China is leading in transfer learning.
Compared to the overall investment scale in the field of artificial intelligence, the US government's funding for small data methods is relatively small. As a rapidly emerging field, transfer learning has the potential to receive more funding from the US government.
Source: National Defense Technology News