Draft:Filter and Refine Principle

Review waiting, please be patient.

This may take 3 months or more, since drafts are reviewed in no specific order. There are 2,716 pending submissions waiting for review.

If the submission is accepted, then this page will be moved into the article space.
If the submission is declined, then the reason will be posted here.
In the meantime, you can continue to improve this submission by editing normally.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL
Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Reviewer tools

Instructions · What links here · Filter and Refine Principle (talk: + · bio) · (log) · Copyvios report · reFill · Citation Bot · (Search: Google, Bing, Wikipedia) · Submitted 11 days ago by Solopku (talk: D · +) · Last edited 10 days ago by Citation bot

Submission declined on 3 May 2024 by ToadetteEdit (talk).

This submission is not adequately supported by reliable sources. Reliable sources are required so that information can be verified. If you need help with referencing, please see Referencing for beginners and Citing sources.

If you would like to continue working on the submission, click on the "Edit" tab at the top of the window.
If you have not resolved the issues listed above, your draft will be declined again and potentially deleted.
If you need extra help, please ask us a question at the AfC Help Desk or get live help from experienced editors.
Please do not remove reviewer comments or this notice until the submission is accepted.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL
Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Declined by ToadetteEdit 16 days ago. Last edited by Citation bot 10 days ago. Reviewer: Inform author.

This draft has been resubmitted and is currently awaiting re-review.

FRP stands for Filter and Refine Principle (or, Filtering and Refinement Principle).^[1]^[2]^[3], which is a widely recognized computational strategy in computer science that efficiently processes large sets of objects through a two-stage approach: filtering and refinement.

The filtering stage quickly eliminates less promising or irrelevant objects from a large set using efficient, less resource-intensive algorithms. This stage is designed to reduce the volume of data that needs to be processed in the more resource-demanding refinement stage.

Following filtering, the refinement stage applies more complex and computationally expensive techniques to the remaining objects to achieve higher accuracy via finer-grained processing. This stage is essential for obtaining the desired quality and precision in the results.

Explanations[edit]

FRP follows a two-step processing strategy:

Filter: an efficient filter function $f_{filter}$ is applied to each object $x$ in the dataset ${\mathcal {D}}$ . The filtered subset ${\mathcal {D}}'$ is defined as ${\mathcal {D}}'=\{x|f_{filter}(x)\geq x\}$ for value-based tasks, where $v$ is a threshold value, or ${\mathcal {D}}'=\{x|f_{filter}(x)=v\}$ for type-based tasks, where $v$ is the target type(s).
Refine: a more complex refinement function $f_{refine}$ is applied to each object $x$ in ${\mathcal {D}}'$ , resulting in the set ${\mathcal {R}}=\{x|f_{refine(x)}\geq v\}$ , or likewise, ${\mathcal {R}}=\{x|f_{refine(x)}=v\}$ , as the final output.

This strategy balances the trade-offs between processing speed and accuracy, which is crucial in situations where resources such as time, memory, or compute are limited.

FRP is a general method for completing a computationally intensive task as quickly as possible [1]. This strategy is widely used across various fields and applications, from database indexing/query processing, and information retrieval to machine learning and big data analytics. Its implementation helps in optimizing systems to better manage the inherent trade-offs between speed and accuracy ^[4].

Related Concepts[edit]

Reinforcement Learning[edit]

In the domain of artificial intelligence, Reinforcement Learning (RL) ^[5] demonstrates the Filter and Refine Principle (FRP) through the processes of policy and value function estimation. In RL, agents learn to make decisions by exploring the environment and receiving feedback in the form of rewards. For example, in AlphaZero ^[6], the filtering stage in RL involves narrowing down the set of possible actions at each state by using a policy network, which predicts potentially rewarding actions based on previous experiences. This approach reduces the search space significantly, enabling more efficient learning processes.

The refinement stage in RL involves more detailed simulations or deeper analysis through techniques like Monte Carlo tree search (MCTS) or temporal difference learning, which refine the policy and value estimates to optimize long-term rewards. This step is crucial for fine-tuning the strategy to achieve optimal performance, particularly in complex environments where numerous possible actions and outcomes exist. RL's application of FRP is critical in scenarios requiring both quick decision-making and high accuracy, such as in autonomous vehicles and gaming.

Mixture of Experts[edit]

The mixture of experts (MoE) is a machine learning paradigm that incorporates FRP by dividing a complex problem into simpler, manageable sub-tasks, each handled by a specialized expert ^[7]. In the filtering stage, a gating mechanism—acting as a filter that determines the most suitable expert for each specific part of the input data based on the data's characteristics. This stage is critical as it allows the system to allocate resources more efficiently by engaging only the most relevant experts for each task.

During the refinement stage, each chosen expert applies its specialized knowledge to process the input more thoroughly ^[8]. This approach enhances the overall system’s performance by combining the strengths of various experts tailored to different aspects of the data. The refinement ensures that each segment of the input is processed optimally, leading to improved accuracy and adaptability of the model across diverse scenarios. MoE models are particularly effective in tasks where different types of expertise are beneficial, such as in complex decision-making systems and large-scale classification problems.

Cascading Classifiers[edit]

Cascading classifiers showcase the Filter and Refine Principle (FRP) within the domain of computer vision ^[9]. This approach utilizes a series of classifiers arranged in a cascade, where each classifier incrementally increases in complexity and precision. The initial stages typically employ simple and fast classifiers that quickly filter out obvious negatives, effectively reducing the dataset's volume that subsequent classifiers must process.

As the data progresses through the cascade, each subsequent classifier deals with a progressively smaller, but more challenging, subset of data. These classifiers are designed to focus on harder cases that have passed through the initial filters. This staged refinement helps in minimizing computational costs while maintaining or improving the accuracy of the final decision.

Cascading classifiers are particularly useful in real-time processing scenarios, such as face detection in video streams ^[10], where the ability to quickly discard non-relevant sections of the input data stream is crucial. This method not only improves processing speed but also enhances the system's ability to handle complex tasks under resource constraints.

Database Indexing[edit]

Indexing is a fundamental application of FRP in database management systems, where the filter stage is implemented through various indexing structures like B-trees ^[11], hash tables, or inverted indexes ^[12]. These structures quickly filter out large volumes of data to find relevant subsets based on query criteria, significantly reducing the amount of data that needs to be accessed during query processing. This efficient filtering is critical in databases with large datasets, which enables quick response times even under heavy query loads.

The refinement stage in indexing involves accessing the actual data entries after they have been identified by the index. This stage may involve further filtering based on more complex query criteria or performing additional calculations such as aggregations or joins. By refining the initially filtered results, the system ensures that only the most relevant data is processed and presented to the user, optimizing both the accuracy and performance of the database.

Query Processing[edit]

Query processing in large databases and information retrieval systems frequently employ the FRP to handle vast amounts of data efficiently ^[13]. Initially, a query processor filters data through mechanisms like query optimization, where queries are reformulated and simplified to reduce the computational cost. This process might involve selecting the most efficient query plan or utilizing statistical estimates to quickly prune large data sections that do not match the query criteria. This initial filtering stage is crucial to ensure that the system uses its resources effectively, preparing the ground for more detailed examination ^[14]

In the refinement stage of query processing, the system performs a detailed execution of the chosen query plan. This involves accessing the actual data, applying more complex filters, and performing operations such as joins, aggregations, and sorts on the narrowed-down data set. This stage refines the results to meet the query's exact specifications, ensuring high precision and relevance. This dual-stage processing is fundamental in environments with large, complex data sets where performance and accuracy are critical, such as in online transaction processing systems and large-scale analytical queries.

History and Development[edit]

The Filter and Refine Principle (FRP) has been a cornerstone in the evolution of computational systems, although it was not always explicitly recognized as a formal principle. Its origins can be traced back to early computing practices where efficiency and resource management were critical, leading to the development of algorithms and systems that implicitly used FRP-like strategies ^[15]. Over the decades, as computational resources expanded and the complexity of tasks increased, the need for formalizing such a principle became evident. This led to a more structured application of FRP across various domains ^[16]^[17], from databases and operating systems to network design and machine learning, where trade-offs between speed and accuracy are continuously managed.

The formal acknowledgment of FRP as a distinct principle in computer science is relatively recent. It has been increasingly cited in academic literature and industry practices as systems face growing volumes of data and demand for real-time processing ^[18]. This recognition is a testament to the evolving nature of technology and the need for frameworks that can adaptively manage the dual demands of efficiency and precision. Today, FRP is integral to the design of scalable systems that require handling large datasets efficiently, ensuring that it remains relevant in the era of big data and beyond.

Future Directions[edit]

The Filter and Refine Principle (FRP) is more than just a computational strategy; it represents a fundamental approach to problem-solving in computer science. By efficiently managing resources through the initial filtering of data and subsequently refining the processing, FRP enables systems to maintain high levels of performance without compromising on accuracy. As data continues to grow in size and complexity, the importance of such a principle only escalates. FRP provides a scalable solution that can adapt to various computational challenges, making it indispensable in the design and implementation of modern software and hardware systems.

Looking forward, the application of FRP is expected to expand further as emerging technologies such as artificial intelligence, machine learning, and large-scale simulation demand more sophisticated data processing capabilities. The principle's inherent flexibility and efficiency make it ideal for these advanced applications, ensuring that it will continue to play a crucial role in driving technological innovation and handling the exponential growth of data in the digital age.

References[edit]

^ Park, Ho-Hyun; Lee, Chan-Gun; Lee, Yong-Ju; Chung, Chin-Wan (1999). Early separation of filter and refinement steps in spatial query optimization. 6th International Conference on Advanced Systems for Advanced Applications. IEEE. doi:10.1109/DASFAA.1999.765748.
^ Ooi, Beng Chin; Pang, Hwee Hwa; Wang, Hao; Wong, Limsoon; Yu, Cui (2002). Fast filter-and-refine algorithms for subsequence selection. International Database Engineering and Applications Symposium. IEEE. doi:10.1109/IDEAS.2002.1029677.
^ Xing, Naili; Cai, Shaofeng; Chen, Gang; Luo, Zhaojing; Ooi, Beng Chin; Pei, Jian (2024). "Database Native Model Selection: Harnessing Deep Neural Networks in Database Systems". Proceedings of the VLDB Endowment. 17 (5): 1020–1033. doi:10.14778/3641204.3641212.
^ Abel, David J.; Gaede, Volker; Power, Robert A.; Zhou, Xiaofang (1999). "Caching strategies for spatial joins". GeoInformatica. 3: 33–59. doi:10.1023/A:1009844729517.
^ Sutton, Richard S.; Barto, Andrew G. (2018). Reinforcement learning: An introduction (PDF). MIT press.
^ Silver, David; Schrittwieser, Julian; Simonyan, Karen; Antonoglou, Ioannis; Huang, Aja; Guez, Arthur; Hubert, Thomas; Baker, Lucas; Lai, Matthew; Bolton, Adrian; Chen, Yutian; Lillicrap, Timothy; Hui, Fan; Sifre, Laurent; van den Driessche, George; Graepel, Thore; Hassabis, Demis (2017). Mastering the game of go without human knowledge. Nature. Vol. 550, no. 7676. pp. 354–359.
^ Shazeer, Noam; Mirhoseini, Azalia; Maziarz, Krzysztof; Davis, Andy; Le, Quoc; Hinton, Geoffrey; Dean, Jeff (2017). Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv:1701.06538.
^ Lin, Bin; Tang, Zhenyu; Ye, Yang; Cui, Jiaxi; Zhu, Bin; Jin, Peng; Huang, Jinfa; Zhang, Junwu; Ning, Munan; Yuan, Li (2024). MoE-LLaVA: Mixture of experts for large vision-language models. arXiv:2401.15947.
^ Viola, Paul; Jones, Michael (2001). Rapid object detection using a boosted cascade of simple features. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 1. IEEE. doi:10.1109/CVPR.2001.990517.
^ Lienhart, Rainer; Maydt, Jochen (2002). An extended set of haar-like features for rapid object detection. International Conference on Image Processing. Vol. 1. IEEE. doi:10.1109/ICIP.2002.1038171.
^ Comer, Douglas (1979). "Ubiquitous B-tree". ACM Computing Surveys (CSUR). 11 (2): 121–137. doi:10.1145/356770.356776.
^ Zobel, Justin; Moffat, Alistair (2006). "Inverted files for text search engines". ACM Computing Surveys (CSUR). 38 (2): 6–es. doi:10.1145/1132956.1132959.
^ Chaudhuri, Surajit (1998). An overview of query optimization in relational systems (PDF). ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.
^ Graefe, Goetz (1993). "Query evaluation techniques for large databases". ACM Computing Surveys (CSUR). 25 (2): 73–169. doi:10.1145/152610.152611.
^ Stonebraker, Michael; Hachem, Nabil; Helland, Pat (2018). The end of an architectural era: It's time for a complete rewrite (PDF). CSAIL. pp. 463–489.
^ Patterson, David A. (2004). "Latency lags bandwidth". Communications of the ACM. 47 (10): 71–75. doi:10.1145/1022594.1022596.
^ Dean, Jeffrey; Ghemawat, Sanjay (2008). "MapReduce: simplified data processing on large clusters". Communications of the ACM. 51 (1): 107–113. doi:10.1145/1327452.1327492.
^ Jordan, M. I.; Mitchell, T. M. (2015). "Machine learning: Trends, perspectives, and prospects". Science. 349 (6245): 255–260. Bibcode:2015Sci...349..255J. doi:10.1126/science.aaa8415. PMID 26185243.

External links[edit]

Early Separation of Filter and Refinement Steps in Spatial Query Optimization in Proceedings. 6th International Conference on Advanced Systems for Advanced Applications. IEEE, 1999.
Fast Filter-and-refine Algorithms for Subsequence Selection in Proceedings International Database Engineering and Applications Symposium. IEEE, 2002.
Filter and Refine Strategy: Wood, J. (2017). Encyclopedia of GIS. Springer, Cham.

[1] Park, Ho-Hyun; Lee, Chan-Gun; Lee, Yong-Ju; Chung, Chin-Wan (1999). Early separation of filter and refinement steps in spatial query optimization. 6th International Conference on Advanced Systems for Advanced Applications. IEEE. doi:10.1109/DASFAA.1999.765748.

[2] Ooi, Beng Chin; Pang, Hwee Hwa; Wang, Hao; Wong, Limsoon; Yu, Cui (2002). Fast filter-and-refine algorithms for subsequence selection. International Database Engineering and Applications Symposium. IEEE. doi:10.1109/IDEAS.2002.1029677.

[3] Xing, Naili; Cai, Shaofeng; Chen, Gang; Luo, Zhaojing; Ooi, Beng Chin; Pei, Jian (2024). "Database Native Model Selection: Harnessing Deep Neural Networks in Database Systems". Proceedings of the VLDB Endowment. 17 (5): 1020–1033. doi:10.14778/3641204.3641212.

[4] Abel, David J.; Gaede, Volker; Power, Robert A.; Zhou, Xiaofang (1999). "Caching strategies for spatial joins". GeoInformatica. 3: 33–59. doi:10.1023/A:1009844729517.

[5] Sutton, Richard S.; Barto, Andrew G. (2018). Reinforcement learning: An introduction (PDF). MIT press.

[6] Silver, David; Schrittwieser, Julian; Simonyan, Karen; Antonoglou, Ioannis; Huang, Aja; Guez, Arthur; Hubert, Thomas; Baker, Lucas; Lai, Matthew; Bolton, Adrian; Chen, Yutian; Lillicrap, Timothy; Hui, Fan; Sifre, Laurent; van den Driessche, George; Graepel, Thore; Hassabis, Demis (2017). Mastering the game of go without human knowledge. Nature. Vol. 550, no. 7676. pp. 354–359.

[7] Shazeer, Noam; Mirhoseini, Azalia; Maziarz, Krzysztof; Davis, Andy; Le, Quoc; Hinton, Geoffrey; Dean, Jeff (2017). Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv:1701.06538.

[8] Lin, Bin; Tang, Zhenyu; Ye, Yang; Cui, Jiaxi; Zhu, Bin; Jin, Peng; Huang, Jinfa; Zhang, Junwu; Ning, Munan; Yuan, Li (2024). MoE-LLaVA: Mixture of experts for large vision-language models. arXiv:2401.15947.

[9] Viola, Paul; Jones, Michael (2001). Rapid object detection using a boosted cascade of simple features. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 1. IEEE. doi:10.1109/CVPR.2001.990517.

[10] Lienhart, Rainer; Maydt, Jochen (2002). An extended set of haar-like features for rapid object detection. International Conference on Image Processing. Vol. 1. IEEE. doi:10.1109/ICIP.2002.1038171.

[11] Comer, Douglas (1979). "Ubiquitous B-tree". ACM Computing Surveys (CSUR). 11 (2): 121–137. doi:10.1145/356770.356776.

[12] Zobel, Justin; Moffat, Alistair (2006). "Inverted files for text search engines". ACM Computing Surveys (CSUR). 38 (2): 6–es. doi:10.1145/1132956.1132959.

[13] Chaudhuri, Surajit (1998). An overview of query optimization in relational systems (PDF). ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[14] Graefe, Goetz (1993). "Query evaluation techniques for large databases". ACM Computing Surveys (CSUR). 25 (2): 73–169. doi:10.1145/152610.152611.

[15] Stonebraker, Michael; Hachem, Nabil; Helland, Pat (2018). The end of an architectural era: It's time for a complete rewrite (PDF). CSAIL. pp. 463–489.

[16] Patterson, David A. (2004). "Latency lags bandwidth". Communications of the ACM. 47 (10): 71–75. doi:10.1145/1022594.1022596.

[17] Dean, Jeffrey; Ghemawat, Sanjay (2008). "MapReduce: simplified data processing on large clusters". Communications of the ACM. 51 (1): 107–113. doi:10.1145/1327452.1327492.

[18] Jordan, M. I.; Mitchell, T. M. (2015). "Machine learning: Trends, perspectives, and prospects". Science. 349 (6245): 255–260. Bibcode:2015Sci...349..255J. doi:10.1126/science.aaa8415. PMID 26185243.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]