zeehio 11 hours ago

I find that the tricky part of a good data analysis is knowing the biases in your data, often due to the data collection process, which is not contained in the data itself.

I have seen plenty of overoptimistic results due to improper building of training, validation and test sets, or using bad metrics to evaluate trained models.

It is not clear to me that this project is going to help to overcome those challenges and I am a bit concerned that if this project or similar ones become popular then these problems may become more prevalent.

Another concern is that usually the "customer" asking the question wants a specific result (something significant, some correlation...). If through an LLM connected to this tool my customer finds something that it is wrong but aligned with what he/she wants, as a data scientist/statistician I will have the challenge to make the customer understand that the LLM gave a wrong answer, more work for me.

Maybe with some well-behaved datasets and with proper context this project becomes very useful, we will see :-)

  • rbartelme 10 hours ago

    I agree with all of this. I've worked in optical engineering, bioinformatics, and data science writ large for over a decade, knowing the data collection process is foundational to statistical process control and statistical design of experiments. I've watched former employers light cash on fire chasing results from similar methods this MCP runs on the backend due to lack of measurement/experimental context.

condwanaland 11 hours ago

I love R and am always excited about tools for R but I immediately get suspicious when I see things like:

> RMCP has been tested with real-world scenarios achieving 100% success rate:

juujian 7 hours ago

This will kick of a real wave of AI slob hitting journals, won't it? There is already a p-hacking problem, no help needed.

If you run more than one test, you are bound to eventually get a false positive significant result.

I don't know where I'm going with this. I'm using AI a lot myself, always supervised. This hits different.

pteetor 9 hours ago

All the Python-based functionality of this project can now be handled by the mcptools package[1]. That is, mcptools can field MCP requests and dispatch to R code; no need for an intermediate layer of Python. I wonder if the author knows about mcptools? Or did he start coding before it was available?

[1] https://posit-dev.github.io/mcptools/

nomilk 5 hours ago

Without additional setup, GPT-5 already uses python as it deems necessary (e.g. for calculations). Is an R MPC server any different to GPT-5 (that automatically uses python)? Reasoning: they're both an LLM plus a REPL, (I think) this makes them approximately equal? Or is there some advantage to using an MPC Server?

hbarka an hour ago

R² without data visualization is savage.

rbartelme 10 hours ago

This MCP agent still doesn't defend the statistically illiterate from themselves.

boguscoder 7 hours ago

There’s something unsettling about AI agents being able to perform “machine learning” as per feature list

jgalt212 6 hours ago

I understand this was probably easier to write in Python, but since it's calling out to R would it have made more sense to write the entire thing in R?

Seattle3503 10 hours ago

rmcp is the name of the offical Rust MCP library.

tacoooooooo 10 hours ago

I hate this so much and also great job