[Lex Computer & Tech Group/LCTG] From an online Technology Review Service I get

Robert Primak bobprimak at yahoo.com
Sat Jun 24 09:07:26 PDT 2023


 The paradigm of an "AI model they’d trained themselves that looks for telltale signals of ChatGPT output" has been proven time and time again to spew out false-positives. College professors have lost their jobs for making unsubstantiated accusations and penalizing students for allegedly using AI or ChatGPT when in fact the students came back with notes and drafts proving definitively that their work was original and not generated by AI. An "accuracy rate" which allows for up to 40% false-positives does not encourage placing any faith in AI models which claim to be able to detect AI produced content. The Swiss team was either lying, or else they were deceiving themselves. Put succinctly, AI can never definitively detect AI.
Nevertheless, there is a very real possibility that at least some LLMs are reusing data sets which were supplied with the help of AI models. The models might very well be feeding upon their own errors. And this does pose a real threat to the integrity of the information presented as output by all LLMs. How LLMs and other AI are trained does influence their behaviors when they are put into use.
Outsourcing any part of AI training does run the risk that the gig workers involved will take every shortcut they can to generate the illusion that they are producing more finished work in a shorter time interval. Gig workers are notorious for doing everything they can to avoid doing any real work.
The obvious solution is this: Pay qualified people to do the training work under the watchful eyes of in-house research staff. It's a lot more expensive, but this is the only way to make sure no one is taking thee easy way out and endangering the integrity of the entire project. When you take shortcuts in technology development, it is your end-users who invariably pay the price. 
-- Bob Primak 






    On Saturday, June 24, 2023 at 11:43:34 AM EDT, jjrudy1 at comcast.net <jjrudy1 at comcast.net> wrote:  
 
 <!--#yiv3946723560 filtered {}#yiv3946723560 filtered {}#yiv3946723560 filtered {}#yiv3946723560 filtered {}#yiv3946723560 p.yiv3946723560MsoNormal, #yiv3946723560 li.yiv3946723560MsoNormal, #yiv3946723560 div.yiv3946723560MsoNormal {margin:0in;font-size:11.0pt;font-family:"Calibri", sans-serif;}#yiv3946723560 h1 {margin-right:0in;margin-left:0in;font-size:24.0pt;font-family:"Calibri", sans-serif;font-weight:bold;}#yiv3946723560 h2 {margin-right:0in;margin-left:0in;font-size:18.0pt;font-family:"Calibri", sans-serif;font-weight:bold;}#yiv3946723560 h3 {margin-right:0in;margin-left:0in;font-size:13.5pt;font-family:"Calibri", sans-serif;font-weight:bold;}#yiv3946723560 a:link, #yiv3946723560 span.yiv3946723560MsoHyperlink {color:#0563C1;text-decoration:underline;}#yiv3946723560 span.yiv3946723560EmailStyle17 {font-family:"Arial", sans-serif;font-variant:normal !important;color:windowtext;text-transform:none;text-decoration:none none;vertical-align:baseline;}#yiv3946723560 span.yiv3946723560Heading1Char {font-family:"Calibri", sans-serif;font-weight:bold;}#yiv3946723560 span.yiv3946723560Heading2Char {font-family:"Calibri", sans-serif;font-weight:bold;}#yiv3946723560 span.yiv3946723560Heading3Char {font-family:"Calibri", sans-serif;font-weight:bold;}#yiv3946723560 span.yiv3946723560bylineby--3i70z {}#yiv3946723560 p.yiv3946723560bylineauthor--g26rn, #yiv3946723560 li.yiv3946723560bylineauthor--g26rn, #yiv3946723560 div.yiv3946723560bylineauthor--g26rn {margin-right:0in;margin-left:0in;font-size:11.0pt;font-family:"Calibri", sans-serif;}#yiv3946723560 span.yiv3946723560screen-reader-text {}#yiv3946723560 span.yiv3946723560imagecredit--1fj0h {}#yiv3946723560 span.yiv3946723560slideradclosetext--1dj37 {}#yiv3946723560 .yiv3946723560MsoChpDefault {font-family:"Calibri", sans-serif;}#yiv3946723560 filtered {}#yiv3946723560 div.yiv3946723560WordSection1 {}#yiv3946723560 filtered {}#yiv3946723560 filtered {}#yiv3946723560 filtered {}#yiv3946723560 filtered {}#yiv3946723560 filtered {}#yiv3946723560 filtered {}#yiv3946723560 filtered {}#yiv3946723560 filtered {}#yiv3946723560 filtered {}#yiv3946723560 filtered {}#yiv3946723560 ol {margin-bottom:0in;}#yiv3946723560 ul {margin-bottom:0in;}-->
ARTIFICIALThis is scary INTELLIGENCE

paid to train AI are outsourcing their work… to AI

It’s a practice that could introduce further errors into already error-prone models.

By 
   
   - Rhiannon Williamsarchive page

June 22, 2023



STEPHANIE ARNETT/MITTR | GETTY

A significant proportion of people paid to train AI models may be themselves outsourcing that work to AI, a new study has found. 

It takes an incredible amount of data to train AI systems to perform specific tasks accurately and reliably. Many companies pay gig workers on platforms like Mechanical Turk to complete tasks that are typically hard to automate, such as solving CAPTCHAs, labeling data and annotating text. This data is then fed into AI models to train them. The workers are poorly paid and are often expected to complete lots of tasks very quickly. 

Related Story





We are hurtling toward a glitchy, spammy, scammy, AI-powered internet

  

Large language models are full of security vulnerabilities, yet they’re being embedded into tech products on a vast scale.

No wonder some of them may be turning to tools like ChatGPT to maximize their earning potential. But how many? To find out, a team of researchers from the Swiss Federal Institute of Technology (EPFL) hired 44 people on the gig work platform Amazon Mechanical Turk to summarize 16 extracts from medical research papers. Then they analyzed their responses using an AI model they’d trained themselves that looks for telltale signals of ChatGPT output, such as lack of variety in choice of words. They also extracted the workers’ keystrokes in a bid to work out whether they’d copied and pasted their answers, an indicator that they’d generated their responses elsewhere.

They estimated that somewhere between 33% and 46% of the workers had used AI models like OpenAI’s ChatGPT. It’s a percentage that’s likely to grow even higher as ChatGPT and other AI systems become more powerful and easily accessible, according to the authors of the study, which has been shared on arXiv and is yet to be peer-reviewed. 

“I don’t think it’s the end of crowdsourcing platforms. It just changes the dynamics,” says Robert West, an assistant professor at EPFL, who coauthored the study. 

Using AI-generated data to train AI could introduce further errors into already error-prone models. Large language models regularly present false information as fact. If they generate incorrect output that is itself used to train other AI models, the errors can be absorbed by those models and amplified over time, making it more and more difficult to work out their origins, says Ilia Shumailov, a junior research fellow in computer science at Oxford University, who was not involved in the project.

Even worse, there’s no simple fix. “The problem is, when you’re using artificial data, you acquire the errors from the misunderstandings of the models and statistical errors,” he says. “You need to make sure that your errors are not biasing the output of other models, and there’s no simple way to do that.”

The study highlights the need for new ways to check whether data has been produced by humans or AI. It also highlights one of the problems with tech companies’ tendency to rely on gig workers to do the vital work of tidying up the data fed to AI systems.  

“I don’t think everything will collapse,” says West. “But I think the AI community will have to investigate closely which tasks are most prone to being automated and to work on ways to prevent this.”

hide

by Rhiannon Williams

  

  

John Rudy

  

781-861-0402

781-718-8334  cell

13 Hawthorne Lane

Bedford MA

jjrudy1 at comcast.net



  
===============================================
::The Lexington Computer and Technology Group Mailing List::
Reply goes to sender only; Reply All to send to list.
Send to the list: LCTG at lists.toku.us      Message archives: http://lists.toku.us/pipermail/lctg-toku.us/
To subscribe: email lctg-subscribe at toku.us  To unsubscribe: email lctg-unsubscribe at toku.us
Future and Past meeting information: http://LCTG.toku.us
List information: http://lists.toku.us/listinfo.cgi/lctg-toku.us
This message was sent to bobprimak at yahoo.com.
Set your list options: http://lists.toku.us/options.cgi/lctg-toku.us/bobprimak@yahoo.com
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.toku.us/pipermail/lctg-toku.us/attachments/20230624/03062ebf/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 1523958 bytes
Desc: not available
URL: <http://lists.toku.us/pipermail/lctg-toku.us/attachments/20230624/03062ebf/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 1990855 bytes
Desc: not available
URL: <http://lists.toku.us/pipermail/lctg-toku.us/attachments/20230624/03062ebf/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 96832 bytes
Desc: not available
URL: <http://lists.toku.us/pipermail/lctg-toku.us/attachments/20230624/03062ebf/attachment.png>


More information about the LCTG mailing list