• Valmond@lemmy.world
    link
    fedilink
    English
    arrow-up
    53
    arrow-down
    3
    ·
    edit-2
    10 hours ago

    The fact that you can achieve a constant average query time, regardless of the hash table’s fullness, was wholly unexpected — even to the authors themselves.

    WTF?

    I mean first of all, the “article” is just crap IMO, like only hinting att “anazing” things. But also containing basic errors like the above. It’s like they don’t understand complexity, a constant average time on what? A specific size of a hash table? Or do they mean it scales linearly with its size? Just go through the whole table and voila you have constant time already.

    So if they did find something, it’s not well explained in the article IMO. Shame, cause those kind of things are interesting, IMO.

    Edit: looks like they “cheat” (in a good way) to “reorder” the table to get their results with smarter inserts (their ‘elastic hashing’) so to not reorder the table. If so, then it’s very smart but not groundbreaking. I’m sick and just skimmed the paper, don’t kill me.

      • deegeese@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        The article is discussing how to reduce the constant time factor which depends on the filling fraction, which is a speed-memory tradeoff when creating the hash table.

        The innovation described allows for the use of fuller tables which are resized less frequently, or faster insertion/retrieval for the existing filling fraction.

    • blx@lemmy.zip
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      1
      ·
      edit-2
      8 hours ago

      “Constant average query time” is not that hard to understand. It means that sometimes access time is e.g. linear, and sometimes you get your content before executing the code. With a hash table large enough and full enough, this can be used to fetch content seconds, minutes, days, potentially years before the program even exists. That’s one hell of a breakthrough.

      [edit] /s, oops

  • tyler@programming.dev
    link
    fedilink
    English
    arrow-up
    52
    ·
    12 hours ago

    This is incredible, but why does the article end by stating that this might not have any immediate applications? Shouldn’t this immediately result in more efficient hash tables in everyday programming languages?

    • barsoap@lemm.ee
      link
      fedilink
      English
      arrow-up
      39
      ·
      edit-2
      5 hours ago

      After reading through the abstract the article is pop sci bunk: They developed a method to save additional space with constant-time overhead.

      Which is certainly novel and nice and all kinds of things but it’s just a tool in the toolbox, making things more optimal in theory says little about things being faster in practice because the theoretical cost models never match what real-world machines are actually doing. In algorithm classes we learn to analyse sorting algorithms by number of comparisons, and indeed the minimum necessary is O(n log n), in the real world, it’s numbers of cache invalidation that matters: CPUs can compare numbers basically instantly, getting the stuff you want to compare from memory to the CPU is where time is spent. It can very well be faster to make more comparisons if it means you get fewer, or more regular (so that the CPU can predict and pre-fetch), data transfers.

      Consulting my crystal ball, I see this trickling down into at least the minds of people who develop the usual KV stores, database engineers, etc, maybe it’ll help maybe it won’t those things are already incredibly optimized. Never trust a data structure optimisation you didn’t benchmark. Never trust any optimisation you didn’t benchmark, actually. Do your benchmarks, you’re not smarter than reality. In case it does help, it’s going to trickle down into standard implementations of data structures languages ship with.

      EDIT: I was looking an this paper, not this. It’s actually disproving a conjecture of Yao, who has a Turing prize, certainly a nice feather to have in your cap. It’s also way more into the theoretical weeds than I’m comfortable with. This may have applications, or this may go along the lines of the Karatsuba algorithm: Faster only if your data is astronomically large, for (most) real-world applications the constant overhead out-weighs the asymptotic speedup.

      • taladar@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        13
        ·
        10 hours ago

        Also never even start optimizing until you profile and are sure the bit you are trying to optimize even matters to the overall performance of your program.

    • OhNoMoreLemmy@lemmy.ml
      link
      fedilink
      English
      arrow-up
      5
      ·
      9 hours ago

      Hash trees are super efficient when they’re not nearly full. So the standard trick is just to resize them when they’re too close to capacity.

      The new approach is probably only going to be useful in highly memory constrained applications, where resizing isn’t an option.

      • deegeese@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 hours ago

        Hash tables are used in literally everything and they always need to minimize resizing because it’s a very expensive operation.

        I suspect this will silently trickle into lots of things once it gets picked up by standard Python and JavaScript platforms, but that will take years.

    • rockSlayer@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 hours ago

      Infrastructural APIs are much slower to change, and in a lot of cases the use of those APIs are dependent on a specific version. The change will definitely occur over time as the practical limitations are discovered

      • zkfcfbzr@lemmy.world
        link
        fedilink
        English
        arrow-up
        15
        ·
        11 hours ago

        Hash tables are often used behind the scenes. dicts and sets in python both utilize hash tables internally, for example.

        • source_of_truth@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          edit-2
          9 hours ago

          I’ve only used java but java hash tables are stupid fast in my experience, like everything else in my crap programs was 1000 times slower than the hash table access or storage.

          Just reading the title, it’s talking about searching hash tables, which wasn’t something I was specifically doing.

      • lime!@feddit.nu
        link
        fedilink
        English
        arrow-up
        7
        ·
        11 hours ago

        anything that deserializes arbitrary json will put it into a hash table, right? it would definitely speed up the web.

  • BeigeAgenda@lemmy.ca
    link
    fedilink
    English
    arrow-up
    13
    arrow-down
    1
    ·
    11 hours ago

    Here’s a link to the paper “Tiny Pointers” https://arxiv.org/pdf/2111.12800 those editors at Quantamagazine writes their summary in a strange fashion, for instance using x in stead of n which is normally used in computer science when talking about big O notation.

  • Mubelotix@jlai.lu
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    8
    ·
    11 hours ago

    Very sus. This article is not technically counvincing at all. Probably a few students who dream of inventing the next big thing. Hash tables complexity was mathematically proven to be generally optimal. If you think you optimized it it’s because your tests don’t reflect the common real world

    • Capsicones@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      11
      ·
      edit-2
      9 hours ago

      The paper was published by IEEE and with professors as co-authors. Only the second author is a student. And I wouldn’t dismiss it out of hand like that because of a magazine article. Students come up with breakthroughs all the time. The paper itself says it disproves Yao’s conjecture. I personally plan to implement and benchmark this because the results seem so good. It could be another fibonacci heap situation, but maybe not. Hash tables are so widely used, that it might even be worthwhile to make special hardware to use this on servers, if our current computer architecture is only thing that holds back the performance.

      Edit: author sequence