Is web search and summarization (with iteration) is how we actually do research? How would an AI Agent that is more similar to how one researches would look like?
Is the trust in the result from an AI a function of the result or a function of the AI? Meaning, we trust in answers on the basis of prior experience in relation to the previous answers that have been confirmed to be true. This implies a verification feedback loop. Generating multiple results and averaging over the set is a popular way of verification in AI, but that strikes me somehow as echo-chambery. Maybe breaking the walls of that chamber through use of different models for some generations and validation can help here. Ultimately, it seems, that one cannot verify the correctness of an answer without applying. It is shifting the focus towards the “undo” space - if one can undo the results for “free” there is no need to trust the answer prior. This assumes that we can verify the result of application is desired (evaluation over measurement in a different space) and that it can be undone at acceptable cost, which can not always be even possible - “let me amputate this limb; looks unnecessary - said AI” 😀. Introducing Human-In-The-Loop as an element of ensemble verification prior to execution is a cheat that can help in early stages. It can generalise better if we can identify qualified humans, but is really “just” an optimisation over the original done-by-human scenario.
I think that verification needs to have multiple aspects - one that is based on trusted information sources, one that is based on understanding of the text using a different judge (yes, AI), but most importantly it's about creating the transparency in the decision making process where I can always go and see the specific details that influenced a decision (or a summary) and i can got back in time and review/verify it as well.
Is the trust in the result from an AI a function of the result or a function of the AI? Meaning, we trust in answers on the basis of prior experience in relation to the previous answers that have been confirmed to be true. This implies a verification feedback loop. Generating multiple results and averaging over the set is a popular way of verification in AI, but that strikes me somehow as echo-chambery. Maybe breaking the walls of that chamber through use of different models for some generations and validation can help here. Ultimately, it seems, that one cannot verify the correctness of an answer without applying. It is shifting the focus towards the “undo” space - if one can undo the results for “free” there is no need to trust the answer prior. This assumes that we can verify the result of application is desired (evaluation over measurement in a different space) and that it can be undone at acceptable cost, which can not always be even possible - “let me amputate this limb; looks unnecessary - said AI” 😀. Introducing Human-In-The-Loop as an element of ensemble verification prior to execution is a cheat that can help in early stages. It can generalise better if we can identify qualified humans, but is really “just” an optimisation over the original done-by-human scenario.
I think that verification needs to have multiple aspects - one that is based on trusted information sources, one that is based on understanding of the text using a different judge (yes, AI), but most importantly it's about creating the transparency in the decision making process where I can always go and see the specific details that influenced a decision (or a summary) and i can got back in time and review/verify it as well.