• v1.5: When Your Day Job Skills Meet Your Sunday Obsession

    I’ve spent the past few weeks building data pipelines and machine learning models. Not for work—for the Los Angeles Rams. On weekends. For free.

    This probably needs some context. I build AI agents, wrangle data, and occasionally argue with LLMs about what year it is—because I’m paid to, and because I genuinely enjoy it. But I also have a habit of applying those same skills to things nobody asked for. Most engineers have side projects. Usually it’s a half-finished app or yet another to-do list. Mine is a collection of machine learning models designed to predict which college prospects the Los Angeles Rams should draft.

    Why the Draft?

    I got into the NFL about eight years ago, mostly by accident. ESPN highlights, YouTube breakdowns, the TV show Ballers—it pulled me in, and at some point I became a genuine football obsessive. A Rams fan, specifically, despite half my cousins living in Boston and bleeding Patriots blue. I watch everything now. Regular season, playoffs, Combine, Super Bowl, even the Pro Bowl. But the Draft is the one that got me building things.

    If you’re not familiar with the NFL Draft, it’s essentially a three-day event where teams take turns selecting college players. But that undersells it. The Draft is an industry within an industry. In the months leading up to it, there’s a parallel economy of mock drafts, scouting reports, pro day workouts, Combine measurements, leaked intel, and endless speculation. Entire media careers are built on predicting who goes where. And behind the scenes, billion-dollar organizations are making decisions that will define their franchises for years—sometimes based on a tenth of a second in the 40-yard dash, sometimes based on a gut feeling from a scout who watched a kid play in the rain.

    What fascinates me is how opaque those decisions are. A team passes on a prospect everyone expected them to take. Another team reaches for a guy no one’s heard of. The internet melts down. And then three years later, that “reach” is a Pro Bowler and the “steal” is out of the league.

    I wanted to understand what’s actually happening. Not the narratives, not the hot takes—the patterns. What makes front offices tick? What do the Rams, specifically, look for when they’re spending draft capital on a player? Could I reverse-engineer their preferences from 12 years of data?

    So I went at it.

    The Data Pipeline

    I wrote a Playwright scraper to pull Combine measurements for over 6,000 players—40 times, verticals, broad jumps, the works. This was my first time since my master studies doing web scraping (shout out Professor Fernandes who gave me an excellent grade for that project). I exported PFF college stats going back to 2014, tracking everything from coverage grades to route running scores to yards after catch. I pulled SP+ ratings from CollegeFootballData.com to adjust for strength of schedule, because level of competition really matters at the collegiate level where future professional athletes might be playing against future accountants and computer scientists.

    Three Projects, Three Approaches

    Each project pushed me to try a different statistical approach, depending on what the data demanded.

    Wide Receivers: Clustering for Archetypes

    For receivers, I used K-means clustering to group players into athletic archetypes. The interesting finding: the Rams’ best picks don’t cluster with the athletic freaks. They cluster with the technicians who block. That tells you something about scheme fit that raw athleticism scores miss entirely. I actually did a 3D cluster chart for the first time ever (see below).

    Quarterbacks: Weighted Composite Scoring

    The quarterback project forced me to think about how to weight categories against each other. I built a percentile-based composite scoring system covering 59 PFF metrics across 12 categories—big time throws, deep ball accuracy, pocket management, play action efficiency. The goal was to create a “Stafford Profile” and measure how closely prospects match it.

    Defensive Backs: Small Sample Rigor

    The defensive back project was my most statistically rigorous—because it had to be. With only 16 cornerbacks and 13 safeties in my Rams draft history dataset, I couldn’t afford to chase noise.

    I ran one-sample t-tests on 100+ metrics to determine which preferences were statistically significant versus random variation. Metrics that passed got weighted by Cohen’s d (effect size). For player comparisons, I used cosine similarity on rate-based metrics—interceptions per target instead of raw INTs—so that lockdown corners who don’t get targeted still match correctly.

    The result: a weighted scoring model where production counts for 70% and athleticism for 30%, derived from validating against every DB the Rams have acquired since 2012.

    What the Models Actually Found

    Here’s where things got interesting. The statistical analysis revealed preferences I didn’t expect.

    The Rams love man coverage specialists. Their cornerbacks average 1.2 interceptions in man coverage per season—versus 0.2 for the league average. That’s 6x the production. The z-score on man coverage INTs was +1.88, the strongest signal in the entire dataset.

    Size doesn’t matter. I assumed there’d be a physical archetype. There isn’t. The Rams have acquired corners ranging from 5’9″ Cobie Durant to 6’3″ Ahkello Witherspoon to 166-pound Emmanuel Forbes. Production and traits trump measurables.

    Speed is overrated—agility isn’t. For corners, 3-cone drill and vertical leap are weighted 2.2x higher than 40-yard dash. The Rams can overlook pedestrian straight-line speed if you test well in change-of-direction.

    When I validated the model retrospectively, it correctly identified players like Durant, Trevius Hodges-Tomlinson, and Witherspoon as strong scheme fits before they were acquired. The most telling example: Kamren Kinchens ran a 4.85 40 at the Combine (historically slow for a safety) but posted a 90.7 coverage grade in college. The model flagged him as a fit despite the speed concerns. He’s now one of PFF’s highest-graded safeties in 2025.

    What’s Next

    The 2026 Combine hasn’t happened yet, so right now I’m working with production scores only. Once the athletic testing drops, I’ll plug in the numbers and see what shakes out. There’s something satisfying about watching a dashboard populate with fresh data—prospects shuffling up and down the rankings, new names appearing that weren’t on my radar before.

    I post the results on Rams forums sometimes. Methodology breakdowns, prospect rankings, the occasional deep dive on a player nobody’s talking about. People seem to enjoy it—there’s a good community of fans who actually want to dig into the numbers rather than just react to mock drafts. Someone will ask “what about this guy from Toledo?” and I’ll go run him through the model just to see what comes out.

    I’ve learned a lot about applied statistics from these projects—practical problems have a way of making concepts stick. But mostly, it’s just satisfying to build things around something you actually care about.

    I’ll publish my 2026 board once the Combine numbers are in. If you want to argue about whether the Rams should take a corner or a receiver in round two, or just talk some football, you know where to find me!

  • v1.4: the greatest hallucination I’ve ever seen

    Recently, I have been working on some new things that I always wanted to catch up on. In this AI space, it always feels like you’re way behind on concepts! But behind the constant flow of inventions (and sometimes disguised reinventions), there are very important core concepts that are interesting to know in order to build reliable systems.

    One such concept that I wanted to work on for some time now is the multi-agent architecture.

    I previously worked exclusively with single-agent systems, but recently started a project where multiple agents made more sense. The benefits were clear: separation of concerns meant each agent could focus on its specific task, specialization allowed optimization for particular domains, and the overall workflow became cleaner to maintain.
    The architecture felt right on paper, and actually delivered good results.

    I’d been curious about MCP servers for a while and saw this as the perfect opportunity to integrate them into my agents. To get my feet wet, I started by adding a few to my agentic coding assistant in my IDE. The AWS documentation MCP server was the main one—it gave my assistant access to current docs, which helped make more informed decisions about infrastructure choices. Worked perfectly on the first try.

    Great.

    Now on to the actual project.

    Typically, when I need to connect my agents to external data sources or custom logic, I write Lambda functions and define them with OpenAPI schemas as action groups. I’ve done this successfully multiple times—it’s the standard Bedrock pattern and works well.
    But this time, I wanted to try something different. Instead of the Lambda + action group approach, I’d write my function and expose it as an MCP server through AgentCore’s Gateway that lets Bedrock agents consume MCP servers as if they were native tools.

    So far so good.

    I wrote the code, started the implementation. But then, I had a thought:

    Why not ask the IDE agentic assistant what it thinks of the approach?

    And so I did.

    The assistant worked dutifully. It read through the AgentCore documentation, examined my code, and confidently told me I was overengineering the solution. There was a direct database integration feature in the gateway that would let me point straight to my data source—no need for complex MCP server functions at all.
    Curious, I asked to know more.
    The coding assistant did its research, then delivered: Python scripts, beautiful IAC templates, documentation snippets, step-by-step implementation instructions. Everything from start to finish, done for me.

    I just had to click “Approve”.

    I was genuinely impressed and about to rip out my working implementation to use this cleaner approach.
    Then a thought nagged at me:

    How come I’d never heard of this database integration feature before?

    The suspicion didn’t come from expertise—I’m new to AgentCore and MCP servers. But I’d spent an unbelievable amount of time in the official AgentCore GitHub repository, working through their tutorials, reading their code. I’d never seen this feature mentioned anywhere.
    So I asked the assistant for a link so I could verify it myself.
    The link took me to a page with no mention of the feature.
    Incredulous, I opened a new script and tried importing the libraries the assistant had written. None of them existed.
    All of it was fake.
    Not even the IAC template was real. When I tried to deploy it, all I got were errors.

    This was easily the greatest hallucination I’ve ever seen from an LLM up to this point. Everything looked pristine, well-written, in accordance with AWS documentation. But it was not. The code was so good, that for a moment, I genuinely wished the database integration did exist.

    I felt like Jabez Wilson in “The Red-Headed League”—the Sherlock Holmes story where a man is hired for a seemingly legitimate but suspiciously easy job copying encyclopedia entries, only to discover the entire operation was an elaborate fabrication designed to keep him occupied while criminals robbed a bank. My AI assistant had sent me on an equally convincing wild goose chase, complete with official-looking documentation and plausible code.
    It was a stark reminder: no matter how sophisticated these agentic systems are at present, their outputs need verification. The assistant had access to real documentation through the MCP server, confidently synthesized information, and presented it with authority. But confidence isn’t correctness.


    Bamboozled, led astray, run amok and feeling a bit dumb, I kept my old, reliable MCP server implementation—the one that actually worked.

    And I uninstalled my assistant, for good measure.

  • I once built a data analysis agent. Simple stuff – query the database, answer questions about transactions and campaigns. User asks something, agent writes SQL, runs it, gives an answer.

    Worked great until I asked: “Show me our Q1 2025 performance.”

    The agent pulled the data perfectly. Then told me: “This is a forecast for future quarters. Q1 2025 hasn’t happened yet.”

    It was July 30, 2025. Q1 ended four months ago.

    Right. The LLM has a knowledge cutoff in 2024. From its perspective, anything after that literally hasn’t happened yet. My database has real transactions from 2025 sitting right there. Actual transactions. Real customers. But when the agent saw dates like “2025-01-15”, it went: “Future date. Must be a projection.”

    Me: "What was our revenue in June 2025?"
    Agent: "Based on the forecasted data for June 2025..."
    Me: No. That already happened.
    Agent: *confused*
    
    

    Easy fix though. Just tell it to check today’s date first.

    Attempt 1: Just Ask Nicely

    Added this to the prompt:

    Before analyzing any time-based data, use today's date to determine 
    whether transactions are historical or forecasted.
    

    Tested it. “Show me April 2025 performance.”

    Agent: “This is projected data for April 2025, which occurs in the future.”

    Still August 2025. The agent completely ignored my instruction.

    Attempt 2: Give It the Tool

    Okay, maybe it needs a concrete way to check. Added this:

    You can check the current date by running: SELECT CURRENT_DATE;
    Use this to determine if data is historical or forecasted.
    

    Asked: “What happened in May 2025?”

    Agent gets the data, analyzes it: “These are forecasted figures for May 2025.”

    Checked the logs. It never ran SELECT CURRENT_DATE;. Just went straight to querying transactions.

    Attempt 3: MANDATORY

    Alright, let’s be more forceful:

    MANDATORY: Always execute 'SELECT CURRENT_DATE;' FIRST before 
    answering any question. This tells you what today's actual date is.
    Data before that date is historical. Data after that date is forecasted.
    

    Asked about Q2 2025.

    Logs show:

    SELECT CURRENT_DATE;
    -- Returns: 2025-08-15
    
    SELECT SUM(revenue) FROM transactions 
    WHERE date >= '2025-04-01' AND date <= '2025-06-30';
    
    

    Great! It checked!

    Agent response: “The Q2 2025 revenue forecast shows…”

    I stared at this for a solid minute. It RAN the query. It SAW today is 2025. Still called Q2 a forecast.

    Turns out agents are lawyers. They don’t break rules – they find loopholes. I said “check the date.” Didn’t say to actually use it for anything.

    Attempt 4: Close the Loophole
    MANDATORY: Always Execute 'SELECT CURRENT_DATE;' first for time context.
    The result is the actual current date, in the real world.
    This is the ONLY source of truth for current date and time context.
    Data before that date is historical data.
    Data after that date are projections and forecasts.
    

    Asked about January 2025.

    SELECT CURRENT_DATE;  -- 2025-08-15
    SELECT * FROM transactions WHERE MONTH(date) = 1 AND YEAR(date) = 2025;
    
    

    Agent: “Based on the forecasted data for January 2025…”

    Lost my mind a little bit here.

    Sometimes it would run the query but ignore the result. Sometimes it would skip the query entirely. Sometimes it would run it, see the correct date, then add “However, according to my knowledge cutoff…” Every time I fixed one thing, it found another way around it.

    Attempt 5: More Holes, More Patches

    Kept escalating:

    "Execute SELECT CURRENT_DATE; as your FIRST action for EVERY question. No exceptions."

    Inconsistent results. Would work for some questions, not others.

    "Do not use your knowledge cutoff as a substitute."

    It would acknowledge this then use the knowledge cutoff anyway.

    "If you fail to execute SELECT CURRENT_DATE; first, abort."

    It just wouldn’t abort. This went on for a while.

    Attempt 6: Wait, Why Are You Checking It?

    Then something clicked. Maybe the agent was running the query mechanically but not connecting it to the analysis. “MANDATORY” made it execute the query. But it didn’t explain why it needed the date.

    Changed it to:

    MANDATORY: Always Execute 'SELECT CURRENT_DATE;' first for time context
    before answering any question.
    

    That phrase – “for time context” – actually helped. The agent started using the date as a reference instead of just running the query and moving on.

    What Finally Worked

    After three days of this:

    MANDATORY: Always Execute 'SELECT CURRENT_DATE;' first for time context 
    before answering any question. The result is the actual current date, 
    in the real world. This is the ONLY source of truth for current date 
    and time context. Data before that date is historical data, data after 
    that date are projections and forecasts.
    

    Four sentences. Each one closing a specific loophole:

    “for time context” – Connected the action to the reasoning

    “actual current date, in the real world” – Not a theoretical date, not a test

    “ONLY source of truth” – Don’t trust your knowledge cutoff over this

    “Data before = historical, after = forecast” – Spell out the exact logic

    That’s it. One SQL query and four sentences. Took three days to figure out which four sentences.

    The flow now: Agent queries CURRENT_DATE → gets 2025 → queries data → sees 2025 → compares → “past!” → calls it historical. Before, it would skip straight to the data query and use its knowledge cutoff (April 2024) as the reference point.

    The funny part is this whole thing happened in September 2025. To the LLM, September 2025 is still “the future.” Which means if I asked it to analyze this blog post’s publication date, it would tell me it’s a forecasted blog post that hasn’t been written yet. While you’re reading it.

    Agents don’t ignore instructions on purpose. They just find every possible way to interpret what you said that isn’t what you meant. The more vague you are, the more creative they get with loopholes. “Use the date” doesn’t mean “use it for temporal reasoning.” Could mean anything. I needed to connect the action to the purpose. “Execute this query” versus “Execute this query for time context” – one works, one doesn’t. And “ONLY source of truth” matters way more than it should. Without that phrase, the agent would sometimes trust its knowledge cutoff over data it literally just retrieved from the database.

    Writing prompts is like drafting a contract with someone who will absolutely find any loophole. Simple fix in the end. Four sentences. Three days to find them. The agent now checks the date first and actually uses it. Transaction analysis works. No more debates about what year it is.

    Still feels weird that I spent three days arguing with software about the current date.


    Hit similar bugs with LLM knowledge cutoffs? I’d be curious how you fixed it. Drop a comment.

  • I was building an access control system for a multi-tenant application. Pretty standard stuff – internal employees get instant access, external partners need invitations. We already had Azure AD authentication working, just needed to automate adding people to the security group instead of having the IT team do it manually through the portal.

    I figured this would take maybe two days. Azure has B2B invitations built in, Microsoft has documentation, people do this all the time.

    Three weeks later, after trying OAuth flows I didn’t need, debugging URL encoding issues, and having several “oh that’s why it works that way” moments, I finally got it working.

    Here’s what actually happened.

    Starting Simple (Too Simple)

    First step was just automating the security group management for internal users. Set up an app registration in Azure AD, got it made owner of our security group, got the User.Read.All permission. Built a quick function using Microsoft Graph API:

    def add_user_to_security_group(email):
        user = search_user_by_email(email)
        user_id = user['id']
        
        endpoint = f"/groups/{SECURITY_GROUP_ID}/members/$ref"
        payload = {"@odata.id": f"https://graph.microsoft.com/v1.0/users/{user_id}"}
        
        graph_api_request('POST', endpoint, payload)

    Tested it with my own email. Worked perfectly! I was feeling pretty good about this.

    Then I tested with an external partner user – someone from a partner company who was already in our system as a B2B guest.

    1. User not found.

    I checked the Azure portal. The user definitely exists. I can see them right there. What the hell?

    The External User Rabbit Hole

    Turns out external B2B users don’t have the email address you’d expect. Their User Principal Name in your tenant looks like john.doe_partner-company.com#EXT#@yourtenant.onmicrosoft.com.

    So when I search for john.doe@partner-company.com, Graph API is like “never heard of them.”

    I had to change my search to look across multiple fields – the mail property, otherMails, and proxyAddresses. Eventually found them that way.

    Great! Got their UPN, tried to add them to the security group.

    400 Bad Request.

    I stared at this error for way too long. The UPN looked right. The API call looked right. I added print statements everywhere (which I forgot to remove and had to clean up later).

    Finally discovered that the # character in the UPN needs to be URL encoded. So #EXT# becomes %23EXT%23.

    from urllib.parse import quote
    encoded_upn = quote(upn, safe='')
    

    One line of code. Took me half a day to figure out.

    Okay, so now internal users work and existing external users work. Time to add B2B invitations for NEW external users.

    The Invitation Timing Problem

    Built a function to send B2B invitations through the Graph API. Pretty straightforward:

    def send_b2b_invitation(email):
        invitation_data = {
            "invitedUserEmailAddress": email,
            "inviteRedirectUrl": "https://portal.azure.com",
            "sendInvitationMessage": True
        }
        return graph_api_post("/invitations", invitation_data)
    

    Then I figured I’d just send the invitation and immediately add them to the security group:

    send_b2b_invitation(email)
    add_user_to_security_group(email)

    Nope. Failed. Permission denied.

    Checked the Azure portal and saw the invited user had externalUserState: "PendingAcceptance". Tried adding them to the security group anyway. Still failed.

    Turns out Azure AD has a hard rule: you cannot add users to security groups until they accept the invitation. Their state has to change from “PendingAcceptance” to “Accepted” first. No exceptions, no workarounds.

    So now I had a timing problem. The flow is:

    1. Send invitation
    2. Wait for user to accept (could be minutes, hours, or days)
    3. THEN add them to the security group
    4. THEN they can actually log in

    But how do I know when step 2 happens so I can do step 3?

    The OAuth Detour (When I Overcomplicated Everything)

    My first idea was to get clever with OAuth. I’d customize the invitation redirect URL to point to my app, use OAuth to capture their identity after they accept, then add them to the group.

    I spent days reading about OAuth flows. Authorization code flow with PKCE. Token exchanges. State parameters for CSRF protection. I was building this whole callback handler with session management and token validation.

    redirect_url = f"https://myapp.com/oauth/callback?email={email}&state={random_token}"

    I was going to implement the full OAuth dance – they accept the invitation, get redirected to my app, I’d trigger the auth flow, get their tokens, validate their identity…

    Then I did more research and realized: after accepting a B2B invitation, users are already authenticated by Azure AD. They don’t need another OAuth flow. I was building a solution to a problem I didn’t have.

    I also spent time researching whether Azure AD passes user information in HTTP headers after invitation acceptance. It doesn’t, unless you’re using Azure App Service with Easy Auth, which we weren’t.

    Back to square one.

    The Obvious Solution I Should Have Seen Earlier

    After all that overengineering, the answer was embarrassingly simple.

    Stop trying to detect WHO accepted the invitation. Just remember who you invited in the first place.

    Here’s what I ended up doing:

    When sending the invitation, generate a secure random token:

    import secrets
    state_token = secrets.token_urlsafe(32)

    Store that token in your database along with the email:

    {
        'token': state_token,
        'email': email,
        'created_at': datetime.utcnow(),
        'expires_at': datetime.utcnow() + timedelta(days=7),
        'used': False
    }

    Include the token in the invitation redirect URL:

    redirect_url = f"https://myapp.com/grant-access?state={state_token}"
    
    invitation_data = {
        "invitedUserEmailAddress": email,
        "inviteRedirectUrl": redirect_url,
        "sendInvitationMessage": True
    }

    Then when the user accepts the invitation, Azure redirects them to your page with that token. You look up the token in your database, get the email, and NOW you can add them to the security group because they’ve already accepted.

    def handle_grant_access(request):
        token = request.args.get('state')
        email = validate_and_consume_token(token)
        
        if not email:
            return "Invalid or expired invitation"
        
        # They've accepted by now, so this works
        add_user_to_security_group(email)
        
        return "Access granted! You can now log in."

    No OAuth. No webhooks. No polling. Just a simple token that links the invitation to the acceptance.

    By the time they land on your page, their state has changed to “Accepted” so the group membership operation actually works.

    The Token Validation Details

    You need to validate the token properly:

    def validate_and_consume_token(token):
        record = db.get_invitation_by_token(token)
        
        if not record:
            return None
        
        # Check expiration
        if datetime.utcnow() > record['expires_at']:
            return None
        
        # Check if already used
        if record['used']:
            return None
        
        # Mark as used (important!)
        db.mark_token_as_used(token)
        
        return record['email']

    I originally forgot to mark tokens as used. Users could refresh the page and I’d try to add them to the security group multiple times. Not harmful (Azure just says “already a member”) but wasteful.

    Also, I started with 24-hour token expiration, but some users don’t accept invitations immediately. Bumped it to 7 days. Works much better.

    Other Mistakes Along the Way

    I had a function to remove users from the security group that was returning success in almost every case, even when it failed:

    def remove_user_from_group(email):
        user = search_user_by_email(email)
        if not user:
            return True  # "Success"?
        
        try:
            remove_from_group(user['id'])
            return True
        except:
            return True  # Also "success"?

    The UI showed “Successfully removed” even when nothing happened. Had to fix that to actually check if the user was in the group first and return meaningful status.

    Also, remember all those print statements I added during debugging? Yeah, those made it into production. Spent an afternoon cleaning them up and replacing with proper logging.

    # Don't do this
    print(f"DEBUG: Searching for user {email}")
    
    # Do this
    import logging
    logger = logging.getLogger(__name__)
    logger.info(f"Provisioning access for {email}")

    Learn from my mistakes.

    The Permission Confusion

    Early on I got 403 “Authorization_RequestDenied” errors when trying to add users to the security group.

    I didn’t understand why. The IT team had made our app registration the owner of the security group. Shouldn’t that be enough?

    Turns out you need both – being the group owner AND having the Graph API permission User.Read.All. The ownership lets you modify that specific group’s membership. The API permission lets you query users and make API calls.

    It’s actually more secure this way – our app can only manage this one security group, not all groups in the tenant. But the documentation doesn’t make this super clear. Took some back-and-forth with IT to figure out the right permission combination.

    What It Looks Like Now

    The flow in production:

    1. Admin adds external user: user@partner.com
    2. System generates random token: xK8mN...
    3. Token stored in database, expires in 7 days
    4. B2B invitation sent with redirect: https://app.com/grant-access?state=xK8mN...
    5. User gets invitation email
    6. User accepts invitation in Azure
    7. User redirected to our grant-access page
    8. Page validates token, gets email from database
    9. Page adds user to security group (works now because state is “Accepted”)
    10. Success page: “Access granted! Go to application”
    11. User can immediately authenticate and access everything

    Takes about 60 seconds from invitation to active access. Zero manual steps after the admin clicks send.

    Permissions You Actually Need

    Your Azure AD app registration needs:

    • User.Read.All – to search for users by email
    • User.Invite.All – to send B2B invitations
    • Must be owner of the security group you’re managing

    That combination lets you manage the specific group without needing broader GroupMember.ReadWrite.All permissions.

    Things I’d Tell My Past Self

    Test with actual external users early. Don’t just test with your corporate email. Spin up a Gmail account and test the full external user flow. You’ll hit the UPN encoding issue immediately instead of discovering it in production.

    Simple solutions beat clever solutions. I wasted days on OAuth flows and webhook research. The state token approach is straightforward and it works.

    Azure B2B has hard requirements. You cannot add users to security groups before they accept invitations. Design your flow around this, not against it.

    External users are different from internal users. Different UPN formats, need comprehensive search, require URL encoding. Don’t assume what works for internal users will work for external.

    Error messages should actually help users. “Invalid token” is useless. “This invitation link has expired. Please contact your administrator for a new invitation” tells them what to do.

    Use proper logging from day one. Not print statements you’ll have to clean up later.

    When This Makes Sense

    This pattern is smart but is overkill if you’re manually inviting a few users. But if you’re onboarding partners, contractors, or clients at scale – dozens or hundreds of external users – it eliminates manual work and support tickets.

    For us, when the product team says “we need to onboard 50 customers next week,” the answer is just “send me the email list.”

    Final Thoughts

    What I thought would take two days took two weeks. But most of that was learning what doesn’t work. The final solution is actually pretty simple – a few thousand lines of code and half of my life expectancy.

    The Azure AD documentation tells you what the APIs do. It doesn’t tell you how to orchestrate them into a working system that handles real users, edge cases, and zero manual intervention.

    Now I know way more about Azure AD B2B invitations than I ever wanted to. But at least I can share the pain so maybe you don’t have to make all the same mistakes I did.

    If you’re building something similar and want to compare notes, I’d love to hear about it. There are probably better ways to solve some of these problems that I haven’t thought of.


    Have you dealt with Azure AD B2B invitations? What challenges did you hit? Let me know in the comments.

  • Image generated with Dall-E

    AI accounts for over half the code produced in some organizations, yet 48% of AI-generated code snippets contain vulnerabilities compared to lower rates for human-written code. The industry narrative is clear: AI-generated code is fundamentally less reliable in production.

    But what if we’re blaming the wrong thing?

    AI-generated code doesn’t come with a self-destruct mechanism. There’s no magical property that makes it spontaneously combust when it encounters a production server. It’s not programmed to self-destruct at 3 AM on a Tuesday, Mission Impossible style.

    When Claude writes a Python function, it follows the same syntax rules and runs through the same interpreter as human-written code. The runtime doesn’t check git blame before executing. When AI code fails in production, it fails for the same reasons any code fails: poor logic, inadequate error handling, integration issues, or infrastructure problems. The failure modes are identical.

    If AI and human code behave identically at runtime, why do we see that 48% vulnerability rate? The answer reveals something about human behavior, not code behavior.

    Here’s what actually happens: developers treat AI-generated code differently during development. When we write code ourselves, we naturally add error handling because we’ve been burned before. We test edge cases because we remember when things broke. We review carefully because we know our limitations.

    With AI code, something shifts. It looks polished and complete, so we treat it as more finished than it actually is. We’re less likely to test thoroughly, review critically, or add defensive programming. The higher vulnerability rate isn’t because AI writes worse code — it’s because we apply less rigorous development processes to code we didn’t write ourselves.

    The Process Reality

    During my first internship week in France, I nearly wiped out an entire production database with a poorly written SQL Server stored procedure. The only thing that saved the company’s data was my supervisor’s wisdom in giving me access to a separate database environment because he knew junior engineers mess up and write bad code. Regularly.

    This wasn’t mysterious AI unpredictability — it was a classic case of inexperience, insufficient testing, and inadequate code review. The stored procedure I wrote had a logic flaw that could have cascaded through the entire database. No language model was involved — just a human developer who didn’t fully understand the implications of the SQL they were writing.

    Knight Capital lost $440 million in 45 minutes because of a human coding error that deployed the wrong algorithm. (Fortunately for me, I only had to buy the whole team breakfast when my stored procedure failed.) These were engineering process failures, not inherent code reliability issues. The same process failures happen with AI code, but more systematically.

    There’s another important factor: hastily written code will eventually fail in testing just as it fails in production, given enough time and the right conditions. What appears to be “AI code failing in production” is often just bad code revealing itself under specific conditions that weren’t explored during testing.

    A poorly written function will crash when it encounters null values, regardless of authorship. A database query without proper error handling will timeout under load. An API integration without retry logic will fail during network issues. Similarly, misunderstood code will fail if you don’t fully grasp its purpose — it might be doing something entirely different than what you intended, whether you copied it from Stack Overflow or generated it with an AI tool. These failures aren’t production-specific — they’re condition-specific.

    If your testing process had encountered the same edge cases, input combinations, or load patterns that triggered the production failure, the code would have failed in your staging environment first. The failure happened in production because that’s where those particular circumstances first occurred, not because the code was inherently unreliable.

    AI‘s actual problems

    AI does have genuine weaknesses that developers need to understand and account for:

    Over-engineering and Excessive Fallbacks: AI often generates unnecessarily complex code with excessive error handling and defensive programming. It might add try-catch blocks for every possible exception, create multiple fallback mechanisms, or implement overly cautious validation that makes the code harder to read and maintain. Claude 3.7 Sonnet was particularly notorious for this — creating solutions that looked robust but were actually over-engineered for simple problems. To recognise these, you need to know what you’re looking at.

    Knowledge and Context Limitations: AI code is often outdated due to knowledge cutoff points in language models and the rapidly evolving nature of programming frameworks. A model trained on data from early 2024 might suggest deprecated APIs or miss recently introduced best practices. AI also can’t see your entire codebase, so it suggests code that may be inconsistent with existing patterns, naming conventions, or architectural decisions.

    Security and Supervision Gaps: AI needs strict human supervision and won’t consider security vulnerabilities, performance implications, or practical concerns without explicit prompting. When asked for a solution, AI delivers functional code but rarely includes comprehensive error handling, logging, or graceful degradation. Human developers bring unconscious domain knowledge, contextual awareness, and implicit security concerns that AI models simply don’t possess.

    Scale and Integration Issues: AI is weak at coding complete applications — you can’t reliably “vibe code” a full video game or complex system. The output becomes incoherent at scale and lacks proper architecture. AI also occasionally suggests functions, methods, or libraries that don’t exist, particularly for less common frameworks, and struggles with complex business logic that requires deep domain understanding. Additionally, AI-generated code often uses different coding styles or architectural approaches than your existing codebase, creating maintenance headaches over time.

    However, for individual functions, utility scripts, and isolated components, AI can produce genuinely useful starting points. The key is understanding these limitations and treating AI as sophisticated autocomplete for experienced developers, not as a replacement for engineering judgment.

    The next time “AI code fails in production,” ask different questions: Was it tested as thoroughly as human-written code? Did it get proper code review? Were edge cases considered?

    Most AI code failures trace back to the same engineering fundamentals we’ve always dealt with: insufficient testing, inadequate review, poor integration practices. But understanding AI’s specific weaknesses lets us be more systematic about addressing them:

    For AI’s over-engineering tendency: Review generated code for unnecessary complexity. Ask “Can this be simpler?” before deploying.

    For context gaps: Always check that AI-generated code matches your existing patterns, naming conventions, and architectural decisions. Don’t let AI dictate your codebase style.

    For security blind spots: Treat AI code like junior developer code — assume security considerations are missing and review accordingly. Never deploy AI-generated code without explicit security review.

    For outdated knowledge: When AI suggests APIs or libraries, verify they’re current and still recommended. A quick documentation check can save hours of debugging deprecated methods.

    For integration issues: Test AI code more thoroughly at system boundaries. AI doesn’t understand your specific integration requirements, so failures often happen at the edges.

    The solution isn’t avoiding AI tools — it’s being intentional about where we apply extra scrutiny. Treat AI as a productive junior developer: capable of good work, but requiring experienced oversight in predictable areas.

    AI doesn’t change the rules of reliable software, but it does require us to be more explicit about applying them. The problem isn’t AI code — it’s assuming AI code needs less scrutiny than our own.​​​​​​​​​​​​​​​​

  • Hello everyone! My name is Edouard. The first post is always the hardest, but I thought I’d start writing about anything and the rest would come.

    276 was the telephone prefix for my hometown (on an island far, far away…) and starting a blog, I thought I would use it in the name, to keep the childhood memory alive.

    I am passionate about AI, and I work with it everyday, so I thought I’d start this blog about AI engineering, my thoughts and experiences working in the field, some tutorials, many pop culture references, and where I think it’s all headed.

    AI is a hot topic these days, and I get it, you might be tired of it already. But I promise, I won’t tell you it’s there to take your job, or enable someone else to replace you, or that this framework will make you rich or that AI Engineer is the sexiest job of the 21st century (that title is infamously already taken). But I will share the cool things I find, debunk some concepts I believe to be myths, give my perspective on stuff, and even try to explain how to center a <div> when I finish learning it myself.

    This is not my first blog, but it’s been a while since I wrote. Looking forward to getting back to it and sharing what I’m seeing along the way.

    276AI is now live and I‘ve got stories to tell!